Discussion of "Model-based clustering and classification with non-normal mixture distributions" by S.X. Lee and G.J. McLachlan

نویسندگان

  • Giuliano Galimberti
  • Angela Montanari
چکیده

It is a great pleasure to have the chance of reading and commenting on this very interesting paper that provides a unified view on non-gaussian mixture models. It is a very hot topic that has recently been receiving increasing attention in the literature. This paper is especially welcome as it offers to the reader an up-to-date review, with interesting stimuli for reflection and further insight. We would like to comment on the clustering side of the work. The many examples discussed in the paper show how the choice of the distributional shape for the mixture components can affect the clustering performances of the corresponding mixture model. The clustering results are assessed by comparison with a priori known information about group membership through the Adjusted Rand Index (ARI) or the misclassification rate. But in real applications the group structure is unknown and the researcher is faced with the need to derive the “best” clustering with no a priori information on group membership. In the literature on model-based clustering, this problem is often viewed as a model selection problem and likelihood based criteria, such as BIC or ICL, are usually suggested. We wonder if such a strategy can effectively lead to the recovery of the “true” group structure and we try to give an answer through a simple simulation study (which has been performed using the R packages EMMIX-skew and EMMIX-uskew described in the paper). In particular we focus on the selection of the shape of the mixture components by comparing restricted skew t finite mixtures with unrestricted ones (which in the examples of the paper seem to give the best results). Starting from the Australian Institute of Sports data (Section 5.2 in Lee and McLachlan’s paper), a two-component mixture of unrestricted skew t is fitted and the corresponding parameter estimates are used to simulate 500 datasets (with sample size equal to 202, i.e. the same size as the original dataset). For each unit of these data sets, the generating mixture component is recorded and assumed to be the “true” class. Afterwards, on each sample, a two component mixture model is fitted both with restricted and unrestricted skew t components, thus leading to two partitions of the units into two groups. The agreement between each clustering result and the corresponding “true” classification is evaluated through the ARI.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rejoinder to the discussion of "Model-based clustering and classification with non-normal mixture distributions"

Non-normal mixture distributions have received increasing attention in recent years. Finite mixtures of multivariate skew-symmetric distributions, in particular, the skew normal and skew t-mixture models, are emerging as promising extensions to the traditional normal and t-mixture models. Most of these parametric families of skew distributions are closely related, and can be classified into fou...

متن کامل

On Model-Based Clustering, Classification, and Discriminant Analysis

The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...

متن کامل

The Family of Scale-Mixture of Skew-Normal Distributions and Its Application in Bayesian Nonlinear Regression Models

In previous studies on fitting non-linear regression models with the symmetric structure the normality is usually assumed in the analysis of data. This choice may be inappropriate when the distribution of residual terms is asymmetric. Recently, the family of scale-mixture of skew-normal distributions is the main concern of many researchers. This family includes several skewed and heavy-tailed d...

متن کامل

Statistical Wavelet-based Image Denoising using Scale Mixture of Normal Distributions with Adaptive Parameter Estimation

Removing noise from images is a challenging problem in digital image processing. This paper presents an image denoising method based on a maximum a posteriori (MAP) density function estimator, which is implemented in the wavelet domain because of its energy compaction property. The performance of the MAP estimator depends on the proposed model for noise-free wavelet coefficients. Thus in the wa...

متن کامل

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistical Methods and Applications

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2013